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Abstract 

In this note we study the problem of sampling and reconstructing 
signals which are assumed to lie on or close to one of several subspaces of 
a Hilbert space. Importantly, we here consider a very general setting in 
which we allow infinitely many subspaces in infinite dimensional Hilbert 
spaces. This general approach allows us to unify many results derived 
recently in areas such as compressed sensing, affine rank minimisation 
and analog compressed sensing. 

Our main contribution is to show that a conceptually simple iterative 
projection algorithms is able to recover signals from a union of subspaces 
whenever the sampling operator satisfies a bi-Lipschitz embedding condi- 
tion. Importantly, this result holds for all Hilbert spaces and unions of 
subspaces, as long as the sampling procedure satisfies the condition for the 
set of subspaces considered. In addition to recent results for finite unions 
of finite dimensional subspaces and infinite unions of subspaces in finite 
dimensional spaces, we also show that this bi-Lipschitz property can hold 
in an analog compressed sensing setting in which we have an infinite union 
of infinite dimensional subspaces living in infinite dimensional space. 



1 Introduction 

To motivate the general setting of this paper, we start with a review of the 
compressed sensing signal model in finite dimensions. In compressed sensing, 
sparse signals are considered. A class of TV-dimensional signals / in a Hilbert 
space is said to be A"-sparse, if there is an orthonormal basis {ipi}, such that 
the iV-dimensional vector x = [(/, ipi)]i has at most K non-zero elements. More 
generally, if is the best approximation to x with no more than K non-zero 
elements, then if x — x^ is small, x is said informally to be approximately 
A"-sparse. 
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In compressed sensing, a sparse signal is sampled by taking M linear mea- 
surements jjj — (/, 4>j). In matrix notation, this can be written as 

y = *x, (l) 

where y is the vector of measurements (/, 4>j) & n d where <fr is the matrix with 
entries [<&]_/, j = (ipi,(f)j). In practice, the measurement process is never perfect 
and we have to account for measurement noise and inaccuracies. We thus assume 
that the measurements (or samples) are of the form 

y = *x + e, (2) 

where e is a measurement error. 

Traditional sampling theory would predict that we require N samples to be 
able to reconstruct x form the measurements. However, if x is if-sparse or ap- 
proximately if-sparse, then we can often take less samples and still reconstruct 
x with near optimal precision pQ [2] . Importantly, reconstructing x from y can 
often be done using fast polynomial time algorithms. One of the conditions that 
has been shown to be sufficient for the reconstruction of x with many different 
fast algorithms is that the measurement process satisfies what is known as the 
Restricted Isometry Condition of a given order, where the order of the condition 
is related to the sparsity K. 

The Restricted Isometry Constant of order K is generally defined as the 
smallest quantity 8k that satisfies the condition 

(i-<y Jf )||x|||<||#x||l<(i + «y JC )||x|||, (3) 

for all K sparse vectors x. 

The sparse compressed sensing model defines a set of subspaces associated 
with the set of iC-sparse vectors. Fixing the location of the K non-zero elements 
in a vector x defines a i^-dimensional subspace of M. N . There are (^) such 
K dimensional subspaces, each for a different sparsity pattern. All If -sparse 
vectors, that is, all vectors with no more than K non-zero elements, thus lie in 
the union of these (^) subspaces. This interpretation of the sparse model led 
to the consideration of more general union of subspaces (UoS) as in [3], [1] and 
[5j. Such a generalization offers many advantages. For example, many types of 
data are known to be sparse in some representation, but also exhibit additional 
structure. These are so called structured sparse signals, an example of which 
are images, which are not only approximately sparse in the wavelet domain but 
also have wavelet coefficients that exhibit tree structures [6], [7]. Apart from 
tree structured sparse models, structured sparse models include block sparse 
signal models [5] , [S] , |I0] and the simultaneous sparse approximation problem 
[TT] . [T2"] . [13], [13], [TS]. All of these models can be readily seen as UoS models. 

However, the idea of UoS is applicable beyond constrained sparse models. 
For example, signals sparse in an over-complete dictionary [16], [10], the union 
of statistically independent subspaces as considered by Fletcher et. [17] or sig- 
nals sparse in an analysis frame [18] can all be understood from this general 
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viewpoint. All of these examples were of finite unions of subspaces in finite 
dimensional space. But there is nothing that stops us from considering infi- 
nite dimensional spaces and infinite unions. In this case, the UoS model also 
incorporates signal models such as the finite rate of innovation model [Hj , the 
low rank matrix approximation model [20) and the analog compressed sensing 
model PT] , 

We here consider this general setting where we allow infinite unions. In this 
setting, we derive a conceptually simple and efficient computational strategy 
to solve linear inverse problems. To achieve this, we build on previous work 
of [3] and 0], where theoretical properties of UoS models were studied. Of 
importance are also the computational strategies previously suggested in |5j 
(where the authors studied block-sparse models) and in [8] (where structured 
sparse signals were considered). 

We here make the following contribution. We show that, if the sampling 
strategy satisfies a certain bi-Lipschitz embedding property (closely related to 
the Restricted Isometry Property known in compressed sensing), then, in a 
fixed number of iterations, a relatively simple iterative projection algorithm can 
compute near optimal estimates of signals that lie on, or close to, a given UoS 
model. These results are similar to those derived for if-sparse signals in (52] 
and for structured sparse models in [8]. Our contribution here is to show that 
these results extend to more general UoS models (whether finite or infinite) as 
long as the bi-Lipschitz embedding property holds. 

1.1 Sampling and the union of subspaces models 

Union of subspaces models have been considered in [3] , [4] and [5] . To formally 
define the UoS model in a general Hilbert space TC, consider a set of arbitrary 
subspaces A% C 7i. We then define the UoS as the set 



In analogy with compressed sensing, sampling of an element x G TL is done 
using a linear operator $ : TC — > L, where L is some Hilbert space. We then 
write the observations as 



where e S L is again an error term. 

1.2 The bi-Lipschitz condition 

In order to guarantee stability, it is necessary to impose a bi-Lipschitz condition 
on $ as a map from A to L. 

Definition 1. We say that $ is bi-Lipschitz on a set A, if there exist constants 
< a < [3, such that for all Xi, X2 G A 




(4) 



y = <I>x + e, 



(5) 



ailxx+xall 2 < ||*( Xl +x 2 )|| 2 </3|| Xl +x 2 | 



(6) 
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The bi-Lipschitz constants of on A are the largest a and smallest (3 for which 
the above inequalities hold for all xi,X2 G A. 

Whilst (3 is the square of the Lipschitz constant of the map <f> (as a map from 
A to L) , l/a is the square of the Lipschitz constant of the inverse of 3? defined 
as a map from &A C L to A. Note that the requirement a > is equivalent 
to the requirement that <& is one to one as a map from A to L. Therefore, the 
inverse of P is well defined as a function from $A to the set A whenever a > 0. 

1.3 Proximal sets and projections 

When dealing with infinite dimensions and infinite unions, extra care has to 
be taken. In order to guarantee the existence of (possibly non-unique) best 
approximations of elements in H with elements from A, additional assumptions 
on A are required. In addition to the assumption that A is a closed set, we 
assume that the set A is proximal, that is, that for all x € H. the set 

p A (x) = {x : x e A, ||x - x|| = inf ||x - x||} (7) 

is non-empty. For proximal sets A we can therefore define a projection as any 
point x_4 that satisfies 

||x — x_^| — inf ||x — xj|. (8) 

Note that x.4 is the orthogonal projection of x onto one of the subspaces Ai- 
We write this projection as 

Pa(x) = S(?u(x)); (9) 

where S is a set valued operator that returns a single element of the set p^(x). 
How this element is chosen in practice does not influence the theoretical results 
derived here so that we do not specify any particular approach in this paper. 



2 The optimal solution 

In order to talk about optimal solutions, we require the existence of a projec- 
tion of a point y € L onto the set *L. Note that we assume A to be closed 
which implies that <&A is closed if $ is be-Lipschitz. However, as stated above, 
closedness of &A is not sufficient to show that the projection onto <&A exists. 
In this section we therefore also assume that &A is proximal. 
More formally, consider 

inf ||y-#x||. (10) 

As <&A is assumed to be proximal, we can define optimal solutions as those 
elements x opt G A for which 

||y-*x opt ||= inf ||y-#x||. (11) 
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Alternatively, instead of considering proximal sets $>A, we could define e 
optimal points as those points x^ pf S A for which 

||y-*x opt || < inf ; ||y - $x|| + e. (12) 

xG-4 

The results derived below then still hold but will include additional e terms. 
To avoid carrying around these additional terms, we here assume that is a 
proximal subset of L. 

The bi-Lipschitz condition guarantees that 3? is one to one as a function 
from A to L, that is, it maps distinct points form A into distinct points in 
L. We are therefore able, at least in theory, to invert <fr on A. The condition 
also guarantees stability in that, for any x £ A, if we are given an observation 
y = $x + <&x + e, where e <E L and x <E Tt are general errors, then we could, at 
least in theory, recover a good approximation of x as follows. We let y be the 
projection of y onto the closest element in &A. We then look for the unique 
x e A for which y = 3>x. As will be shown more rigorous below, the bi-Lipschitz 
property of 3? then guarantees that x is close to x. 

We now show that all x op t are basically optimal if the bi-Lipschitz property 
holds, that is, we can't define an estimate that performs substantially better. 

Let us first derive an upper bound for the error. Note that by definition 
of x opt , ||y - *x op i|| < || y - *x^||, where we define x^ = P A {x). Defining 
e op t = y - *x opt and = y - $x^ we thus have 

! | X X-opt || || ^-A. X pt || ~t~ 1 1 X X^4 1 1 

< -4=ll*( x -4 - x opt ) || + ||x- X^|| 

- -^||e^-e|| + ||x-x^|| 

1 , 1 ,,„, 

= ~H e.4 + -= e + x-x.4 

< A||e^|| + ||x-x^||, 

where the second inequality is due to the Lipschitz property and the last in- 
equality due to the fact that ||e pt|| < ||e^||. 

We furthermore have the following 'worst case' lower bound 

Theorem 1. For each x there exists an e, such that 

/OX 

||x - x pt|| > y — ||e^|| + ||x - x^|| 
Proof. We have the lower bound 

1 1 x — X op t 



2 = l|x^ - x opt || 2 + \\x-x A \\ 2 - 2({x A - x p t ), (x - x^)) 



> i||*(x^ - x opt )|| 2 + ||x - x^|| 2 - 2((x^ - x op t), (x - XL A )), 
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where from now on we simplify the notation and write (•, •) for the real part of 
the inner product Re(-,-). 

Let Ci be the cone of elements y G L for which x opt S A% . Because x.a is the 
orthogonal projection of x onto the closest subspace, if x G At, then x — x_a is 
orthogonal to Ai- Thus, if x^a £ Ai and if y G Ci, then 

((x^-x opt ),(x-x^)) = 0. (13) 

Also, for all y G Ci, because e opt = y — 4>x opt is orthogonal to <&x.a — 3?x opt , 

~||*(x^ - x opt )|| 2 + i||e opt || 2 = ~IM| 2 , (14) 



so that for all y G Ci 



toptW 2 > ^||e^|| 2 -i||e opt || 2 + ||x-x^|| 2 . 



We can now choose e = c4»x^, where c > 1 is chosen large enough for y to be 
in Ci. Because e opt is orthogonal to &Ai, ||e opt || is constant as a function of c, 
whilst ||e^|| increases for c > 1. We can thus choose c (and thus e) such that 
y G Ci and 



e 



opt I 



i2 >-0.5||e4|| 2 + y/2p\\e A \\\\x-XAl (15) 



so that for all x there is an e such that 

.0.5.. „, M ll2 fol, 

-|M| 2 + ||x-x^ + ^- 

from which the theorem follows. □ 



j|x-x opt || 2 > 2— ||e^|| 2 + ||x-x^|| 2 + J^||e^||||x-x^| 



3 The Iterative Projection Algorithm 

Calculating x opt is highly non-trivial for most $ and A. We therefore propose 
an iterative algorithm and show that under certain conditions on a and (3 we can 
efficiently calculate solutions whose error is of the same order as that achieved 
by x opt . In order for our algorithm to be applicable, we require that we are able 
to efficiently calculate the projection of any x G Tt onto the closest Ai (which 
therefore has to be well defined). 

The Iterative Projection Algorithm is a generalization of the Iterative Hard 
Thresholding algorithm of [23], [24] and [22] to general UoS models. 

Assume A is proximal. Given y and 3?, let x° = 0. The Iterative Projection 
Algorithm is the iterative procedure defined by the recursion 

x™ +1 = P4(x" + M * T (y - *x n )), (16) 

where the non-linear operator -P/i(a) is defined in subsection 11.31 

In many problems, calculation of Pa (a) is much easier than a brute force 
search for x op t- For example, in the if-sparse model, Pa (a) simply keeps the 
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largest (in magnitude) K elements of a and sets the other elements to zero, 
whilst in the low rank matrix approximation problem, different efficient projec- 
tions have been defined in [20] . Furthermore, the above algorithm only requires 
the application of 4? and its adjoint, which can often be computed efficiently. 
Importantly, the next result shows that under certain conditions, not only does 
the algorithm calculate near optimal solutions, it does so in a fixed number of 
iterations (depending only on a form of signal to noise ratio)! 
We have the following main result. 

Theorem 2. Let A be a proximal subset ofTi. Given y = <l?x + e where x is 
arbitrary. Assume <& is bi-Lipschitz as a map from A to L with constants a and 
&■ If P < z < l-5a, then, after 



ln(<5 



\\eA || ' 



ln(2/M 



(17) 



iterations, the Iterative Projection Algorithm calculates a solution x™ satisfying 
||x-x B *|| < (c - 5 +£)|M| + ||x^-x||, (18) 
where c < 3a l 2 ^ and e = <&(x — x^) + e. 

Note that this bound is of the same order as that derived for x opt . 

The above theorem has been proved for the if-sparse model in [22] and 
for constraint sparse models in [8]. Our main contribution is to show that it 
holds for general UoS constrained inverse problem^], as long as the bi-Lipschitz 
property holds with appropriate constants. 

To derive the result, we pursue a slightly different approach to that in [22] 
and [8] and instead follow the ideas of [25] . The proof is based on the following 
lemma. 

Lemma 3. If - > (3 then, using x n+1 = Pa^ 71 + M**( v ~ 3>x n )), we have 

||y-*x n+1 || 2 - ||y-*x"|| 2 

< -<(x^-x»),g) + i||x^-x"|| 2 , (19) 
A 4 

where g = 23>*(y — *x"). 

Proof. The left hand side in the equality of the lemma can be bounded by 

|| y -*x" +1 || 2 -|| y -*x"|| 2 
= -<(x™ +1 -x"),g) + ||*(x" +1 -x™)|| 2 

< -<(x"+ 1 -x"),g) + -||(x"+ 1 -x")|| 2 

A* 



1 It might also be worth noting that the proof of Theorem [2] is not only valid for union 
of subspaces, but holds for arbitrary subsets of A C H for which $ satisfies the bi-Lipschitz 
requirement. A more detailed discussion of this fact is left for an upcoming publication. 
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We will now show that x n+1 = H A (x n + ^g) minimizes — ((x — x™),g) + 
i||(x — x")|| 2 over all x 6 A so that x.4 € A implies that 

-((x" +1 -x"),g) + i||(x" +1 -x")|| 2 < -((x.4-x"),g) + i||(x.4-x")|| 2 , (20) 
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from which the lemma will follow. 

We write the infimum of — ((x — x"), g) + j^\\(5t — x™)|| 2 as 

inf (-(x,g} + (x",g) + i||(x-x")|| 2 ) 
cx inf (-/i(x,g) + (x,x) + ||x"|| 2 - 2(x,x")) 
cx inf (-^(x,g) + (x,x) - 2(x,x")) 

cx inf ||x-x" - ^gll 2 

where the last equality comes from the definition of x n+1 = Pa(x™ + /i<I>*(y — 
<&x n )). Thus, the infimum of — ((x — x n ), g) + ~||(x — x™)|| 2 is proportional to 
inf x6 ^ ||x — x" — ^g|| 2 so that x n+1 simultaneously minimises both quantities. 

□ 

Proof of Theorem^ Let x_4 = P^(x), so that the triangle inequality implies 
that. 

||x-x" +1 || < ||x^-x" +1 || + ||x^-x||. (21) 

The square of the first term on the right is bounded using the bi-Lipschitz 
property of 3? 

||x^-x" +1 || 2 < i||*(x^-x" +1 )|| 2 . (22) 

a 

We expand this, so that 

ll^-x^l^Hly-^^-e^ 2 

< ||y - *x" +1 || 2 + ||e^|| 2 - 2(e A , (y - <frx" +1 )) 

< ||y - *x" +1 || 2 + ||e^|| 2 + \\e A \\ 2 + ||y - *x" +1 || 2 

= 2||y-<&x"+ 1 || 2 + 2|M| 2 , (23) 

where the last inequality follows from — 2(e^, (y — *&x" +1 )) < |je^4||||(y — 
*x" +1 )|| <0.5(||e 4 || 2 + ||(y-*x™+ 1 )|| 2 ). 

We will now show that under the Lipschitz assumption of the theorem, the 
first term on the right is bounded by 

||y - *x» +1 || 2 < (/x - a)\\( XA - x")|| 2 + ||e4|| 2 . (24) 
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To show this, we write 

||y-*x" +1 || 2 -||y-*x n || 2 

< -2((x^ - x"), **(y - *x")) + -\\xa- x"|| 2 

A* 

= -2<(x^ - x"), **(y - *x n )) + a\\ XA - x"|| 2 + (- - a )\\x A - x"|| 2 

A* 

< -2<(x^ - x"), **(y - *x n )) + ||*(x^ - x")|| 2 + (-- a)\\ XA - x"|| 2 

A* 

= ||y - *x^|| 2 - ||y - *x"|| 2 + (-- a)\\x A - x"|| 2 

M 

= ||e^|| 2 -||y-*x"|| 2 + (i-a)||(x^-x n )|| 2 (25) 

A* 

where the first inequality is due to Lemma [3j 
We have thus shown that 



|x^ - x" +1 || 2 < 2 ( — - l) \\(x A - x»)|| 2 + -||e^|| 2 . 



(26) 



Under the condition of the Theorem, 2(^_ — 1) < 1, so that we can iterate 
the above expression 



x.4-x*|| 2 < 2 — -1 

jJLOi 



1 



|x^|| 2 + C ||e^|| 2 , 



(27) 



where c < 



3q-2- 



In conclusion, we have 



x-x K < 



2 2 x,f + ce, 



< |2 2 



k/2 



„0.5 



l x -4 - x || 



x^|| + c-"e.A + x^-x 



(28) 



which means that after k* = 



ln(2/( M a)-2) 
k* II ^ f„0.5 



iterations we have 



|x-x fc || < ( c ^ + <5)||e^|| + ||x^-x|| 



(29) 
□ 



3.1 A remark on e A 

For readers familiar with the literature on compressed sensing a remark is in 
order. In our general result, we have written the bound on the result in terms of 
||e^| = ||<&(x — x^)e||. This is the most general statement in which we do not 
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assume additional structure oni-i^ and <&. This differs from results in sparse 
inverse problems, where, under the bi-Lipschitz property, e_4 is proportional to 
||x — xk\\ + ^ X ~^ K ^ 1 , Here xk is the best if -term approximation to x. It also 
differs from results derived in [8] where A satisfies certain nesting properties and 
where a Restricted Amplification property is used to bound ||e^|| by a function 
of x — x.4. Unfortunately, in the general setting of this paper, such a bound 
is not possible without additional assumptions and the best one could hope for 
would be to bound ||e^|| by ||<[>||||x — x^|| + ||e||. 



4 Examples of bi-Lipschitz embeddings 

The bi-Lipschitz property depends on both, <fr and A. In this section we will 
study three particular cases from the literature. For the first two cases, bi- 
Lipschitz maps have already been studied and we here review the main results 
before deriving a new result that demonstrates how such properties can be 
proved even in an infinite dimensional setting. 



4.1 Finite Unions of Finite dimensional Subspaces 

We start with the finite dimensional setting and with unions of finite dimensional 
subspaces. In particular, let A be the union of L < oo subspaces each of 
dimension no more than K and let &A C M. M . This is an important special 
case of UoS models which covers many of the problems studied in practice, 
such as the if -sparse models used in compressed sensing p], [2], block sparse 
signal models [5] , [9] , [TU] , the simultaneous sparse approximation problem [TTJ , 
H2) [S] j O) [IS], signals sparse in an over-complete dictionary [IB], [TU], the 
union of statistically independent subspaces as considered by Fletcher et. [17] 
and signals sparse in an analysis frame [18] . Finite unions of finite dimensional 
subspaces have therefore been studied in, for example, [4] where the following 
result was derived. 

Theorem 4. For any t > 0, let 

M>^^ln(2L) + 2Kln(^j+tj, (30) 
then there exist a $ and a constant c > such that 

(l - <U(*))llyi — yalli < ||*(yi - y 2 )||i < (l + M*))||yi - yalla (31) 

holds for all yi,y2 from the union of L arbitrary K dimensional subspaces A. 
What is more, if <f> is an M x N matrix generated by randomly drawing i.i.d. 
entries from an appropriately scaled subgaussian distribution, then this matrix 
satisfies equation with probability at least 



e 



-L 



(32) 



2 Examples of these distributions include the Gaussian distribution and random variables 
that are with equal probability [26] |16j . 
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The constant c then only depends on the distribution of the entries in $ and is 
c= jg if the entries of & are i.i.d. normal. 

4.2 Infinite Unions of Finite dimensional Subspaces in M. N 

Recently, similar results could also be derived for a union of infinitely many sub- 
spaces. In [20] minimum rank constrained linear matrix valued inverse problems 
are studied. These problems are another instance of the linear inverse problem 
studied in this paper and can be stated as follows: Find a matrix X £ R mx " 
with rank no more than K, such that y = P(X), where P(-) is a linear function 
that maps M mxrl into K M . Vectorising X as an element x 6 and by writing 
P in matrix form, we have the linear inverse problem where A is the set of 
vectorised matrices with rank at most K. This problem was solved with the 
Iterative Projection Algorithm in [57j where it was also shown that 

Theorem 5. If P is a random nearly isometrically distributed linear majf|, 
then with probability 1 — e~ ClN , 

(1 - <J)||Xi - X 2 || F < ||P(Xa - X 2 )|| < (1 + <S)||Xi - X 2 || F (33) 

for all rank K matrices Xi £ W l x n and X 2 £ K m x n, whenever N > 
CoK(m + n)log(mn), where C\ and Co are constants depending on 5 only. 

4.3 Infinite Unions of Infinite dimensional Subspaces 

We now show that non-trivial bi-Lipschitz embeddings also exist between infinite 
dimensional spaces Ti and L, where A is an infinite union of infinite dimensional 
subspaces in 7i. We here consider the example from [29]. A continuous real val- 
ued time series x(t) is assumed to be band-limited, that is, its Fourier transform 
X(f) is assumed to be zero apart from the set S C [— Bn Bm\- Furthermore, 
the support of X(f) is assumed to be 'sparse' in the sense that we can write S as 
the union of K intervals of 'small' bandwidth Bk, i.e. S C Uj-Lit^fc ^fe + Bk], 
where the dk are arbitrary scalars from the interval [0 Bn — Bk]- Note, due to 
symmetry, we only consider the support in the positive interval [0 Bn]- Cru- 
cially, we assume that KBk < Bn, so that X{f) is zero for most (in terms of 
Lebesgue measure) / in [0 Bn]- Fixing the support S, X(f) and therefore x(t) 
lie on a subspace of the space of all square integrablc functions with bandwidth 
Bn- If KBk < Bn, then there are infinitely many distinct sets S satisfying this 
definition, so that x(t) lies in the union of infinitely many infinite dimensional 
subspaces. 

Classical sampling theory tells us that there exists sampling operators that 
map band- limited functions into li. What is more, these sampling operator 
are not only one to one, but also isometric, that is, bi-Lipschitz embeddings 
with a = 1 and [3 = 1. These sampling operators are given by the Nyquist 

3 See |27| for an exact definition of nearly isometrically distributed linear maps. An example 
would again be if the matrix <l? has appropriately scaled i.i.d. Gaussian entries. 
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sampling theorem, which only takes account of the bandwidth Bn, but does 
not consider additional structure in A. To improve on the classical theory, we 
are thus interested in sampling schemes with a sampling rate that is less than 
the Shannon rate. 

To this end, we show that there exist bi-Lipschitz embeddings of func- 
tions from A into the space of band- limited signals with bandwidth Bm, where 
Bm < -B/V- Combining this embedding with the standard (isometric) Nyquist 
sampling kernel for functions with bandwidth Bm, gives a stable sampling 
scheme where the sampling rate is 2Bm instead of 2B^. The iterative pro- 
jection algorithm will therefore also be applicable to this sampling problem. It 
is worth noting that the bi-Lipschitz embedding property shown here not only 
guarantees invertability of the sampling process, which was demonstrated for 
the problem under consideration in [29], but also guarantees stability of this 
inverse. 

Our treatment here is theoretical in nature and is meant as an example to 
show how bi-Lipschitz embeddings can be constructed in the infinite dimensional 
setting, it is not meant as a fully fledged practical sampling method and many 
practical issues remain to be addressed. 

Compressed Sensing theory has shown that there is a constant c such that 
there are matrices 3? € K MxAr with M < cKln(N/k) which are bi-Lipschitz 
embeddings from the set of all K sparse vectors in M. N to M. M [55]. Therefore, 
assume $ satisfies 

a||x|| 2 < ||*x|| 2 </3||x|| 2 (34) 

for all vectors x S with no-more than 2K non-zero elements. 

The following sampling approach is basically that proposed in [21] and is 
based on mixing of the spectrum of x(t). It uses a matrix 3? G R MxN to define 
this mixing procedure. Our contribution is to show that if the matrix 3? has 
the bi-Lipschitz property with constants a and (3, then so will this sampling 
operator. 

Let A C L^([0 -Bat]) be the subset of the set of square integrable real valued 
functions whose Fourier transform has positive support S C [0 B^\, where S 
is the union of no more than K intervals of width no more than Bk- Let 
M = \B n /Bk\ and let B M = MB K . We then split the interval [0 B N ] into 
M blocks of length Bk as follows. Let Sj be the interval [(J — 1)Bk ]Bk) for 
integers 1 < j < N—l and Sn — [(J— 1)-Bk Bm]- Similarly, let Si be the interval 
[{i — 1)Bk iBk) for integers 1 < i < M — 1 and Sm — [(* — ^)Bk iBx]- We can 
then define a linear map from x(t) to y{t) by mapping the Fourier transform of 
x(t) into the Fourier transform of y(t) as follows 

N 

y{Si) = Yl&ijX&j), (35) 
i=i 

where we use the convention that X (/) = for / > Bjv. In words, the new 
function has the Fourier transform y (defined by symmetry also for / < 0) 
which is constructed by concatenating M functions of length Bk- Each of these 
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blocks is a weighted sum of the N blocks of X, where the weights are the entries 
of the matrix 

We have the following result 

Theorem 6. Let A C £^([0 Bjv]) be the subset of the set of square integrable 
real valued functions whose Fourier transform has positive support S C [0 Bn], 
where S is the union of no more than K intervals of width no more than Bk ■ 
If the matrix S M. MxN is bi-Lipschitz as a map from the set of all K -sparse 
vectors in ~R N into K M ; with bi-Lipschitz constants a and [3, then the map 
defined by eauation \35\ is a bi-Lipschitz map from A to L^([0 -Bm]) such that 



a\\X 1 -X 2 \ 
for all X X ,X 2 E A. 



< H^i-^lli < P\\Xi - X 2 \\l 



(36) 



Proof. To see that this map is bi-Lipschitz from A to L 2 ([0 Bm)), consider 
stacking up the blocks y(Si) and X(Sj) in two vectors. For / € [0 Bk) we use 
/, = (« — !)* Bk + f and fj = (j — 1) * Bk + f and define the vectors 











y(/) = 


y(h) 


= * 






. y(fu) . 




_ X(f N ) _ 



= **(/)■ 



This model is known as an infinite measurement vector model 
Using the norm of L? , we can write 



(37) 



B K 



Pi 



OM/j-W)) 2 df 

f)-y^{{i-l)B K + f)f df 

BK l|yi(/) - y 2 (/)||| df 

* Xl (/)-*x 2 (/)||2d/. 



(38) 



Noting that for fixed /, the vectors xi(/) and x 2 (f) are if-sparse, the bi- 
Lipschitz property of <& leads to the inequalities 



a||x 1 (/)-x a (/)||l4f < 
so that 



|* Xl (/)-*x 2 (/)||id/< 



X2W 2 < \\yi 



y2\\ 2 2 <p\\x 1 



X2W2, 



0\\Mf)-Mf)\\Uf, 

(39) 
(40) 



i.e the mapping defined above satisfies the bi-Lipschitz condition with constants 
a and (3 defined by the bi-Lipschitz constants of the matrix «&. □ 
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If we consider signals whose Fourier transform has support S and if we let 
\S\ be the size of the support, then, if we assume that the support is the union 
of finitely many intervals of length Bk, then we have \S\ — KBk for some K. 
If we then use N = Bn /Bk and select M and Bm such that Bm = MBk, then 
the fact that there are bi-Lipschitz matrices with M = cK In N/ K together with 
the above theorem implies the following corollary 

Corollary 7. Let A C L^([0 Bjy]) be the subset of the set of square integrable 
real valued functions whose Fourier transform has positive support S C [0 -Bjv]; 
where \S\ is bounded and where S is the union of finitely many intervals of finite 
width. There exist bi-Lipschitz embeddings from A to L^([0 Bm]) whenever 



where c is some constant. 

5 Conclusion 

We have here presented a unified framework that allows us to sample and re- 
construct signals that lie on or close to the union of subspaces. The bi-Lipschitz 
property is necessary to guarantee stable reconstruction. We have shown that 
bounds on the bi-Lipschitz constants a and (3 are sufficient for the near optimal 
reconstruction with the iterative projection algorithm. Whilst we have here 
concentrated on the general theory for arbitrary union of subspaces models, 
we have highlighted several more concrete examples from the literature. We 
could also show that bandlimitcd signals with 'sparse' frequency support admit 
sub-Nyquist sampling methods that are bi-Lipschitz. 

We hope that this note offers the basis for the development of novel sampling 
approaches to several problems that fit into the union of subspaces framework. 
On the one hand, we have shown on several examples, how bi-Lipschitz sam- 
pling operators can be constructed. On the other hand, we have suggested an 
algorithmic framework which can reconstruct signals with near optimal accu- 
racy. Whilst our contribution was theoretical in nature, our results point the 
way toward practical strategies that can be developed further in order to tackle 
a given sampling problem. To achieve this, four problems need to be addressed, 
1) defining constraint sets A that capture relevant prior knowledge, 2) designing 
realisable sampling operators that satisfy the bi-Lipschitz property, 3) imple- 
menting efficient ways to store and manipulate the signals on a computer and 
4) developing efficient algorithms to project onto the constraint set. 
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